Understanding and identifying amino acid repeats
نویسندگان
چکیده
Amino acid repeats (AARs) are abundant in protein sequences. They have particular roles in protein function and evolution. Simple repeat patterns generated by DNA slippage tend to introduce length variations and point mutations in repeat regions. Loss of normal and gain of abnormal function owing to their variable length are potential risks leading to diseases. Repeats with complex patterns mostly refer to the functional domain repeats, such as the well-known leucine-rich repeat and WD repeat, which are frequently involved in protein–protein interaction. They are mainly derived from internal gene duplication events and stabilized by ‘gate-keeper’ residues, which play crucial roles in preventing inter-domain aggregation. AARs are widely distributed in different proteomes across a variety of taxonomic ranges, and especially abundant in eukaryotic proteins. However, their specific evolutionary and functional scenarios are still poorly understood. Identifying AARs in protein sequences is the first step for the further investigation of their biological function and evolutionary mechanism. In principle, this is an NP-hard problem, as most of the repeat fragments are shaped by a series of sophisticated evolutionary events and become latent periodical patterns. It is not possible to define a uniform criterion for detecting and verifying various repeat patterns. Instead, different algorithms based on different strategies have been developed to cope with different repeat patterns. In this review, we attempt to describe the amino acid repeat-detection algorithms currently available and compare their strategies based on an in-depth analysis of the biological significance of protein repeats.
منابع مشابه
Detection, Characterization and Evolution of Internal Repeats in Chitinases of Known 3-D Structure
Chitinase proteins have evolved and diversified almost in all organisms ranging from prokaryotes to eukaryotes. During evolution, internal repeats may appear in amino acid sequences of proteins which alter the structural and functional features. Here we deciphered the internal repeats from Chitinase and characterized the structural similarities between them. Out of 24 diverse Chitinase sequence...
متن کاملA Novel algorithm for identifying low-complexity regions in a protein sequence
MOTIVATION We consider the problem of identifying low-complexity regions (LCRs) in a protein sequence. LCRs are regions of biased composition, normally consisting of different kinds of repeats. RESULTS We define new complexity measures to compute the complexity of a sequence based on a given scoring matrix, such as BLOSUM 62. Our complexity measures also consider the order of amino acids in t...
متن کاملElevated evolutionary rate in genes with homopolymeric amino acid repeats constituting nondisordered structure.
Homopolymeric amino acid repeats are tandem repeats of single amino acids. About 650 genes are known to have repeats of this kind comprising seven residues or more in the human genome. According to the evolutionary conservativeness, we classified the repeats into three categories: those whose length is conserved among mammals (CM), those whose length differs among nonprimate mammals but is cons...
متن کاملComparative analysis of amino acid repeats in rodents and humans.
Amino acid tandem repeats, also called homopolymeric tracts, are extremely abundant in eukaryotic proteins. To gain insight into the genome-wide evolution of these regions in mammals, we analyzed the repeat content in a large data set of rat-mouse-human orthologs. Our results show that human proteins contain more amino acid repeats than rodent proteins and that trinucleotide repeats are also mo...
متن کاملSingle Amino Acid Repeats in the Proteome World: Structural, Functional, and Evolutionary Insights
Microsatellites or simple sequence repeats (SSR) are abundant, highly diverse stretches of short DNA repeats present in all genomes. Tandem mono/tri/hexanucleotide repeats in the coding regions contribute to single amino acids repeats (SAARs) in the proteome. While SSRs in the coding region always result in amino acid repeats, a majority of SAARs arise due to a combination of various codons rep...
متن کامل